# Standards used in Language Technology and Lingusitics

## Language related ISO standards
* http://www.iso.org/iso/home/store/catalogue_tc/catalogue_tc_browse.htm?commid=48124

## Language and Language Family Identification
* ISO 639-1
* ISO 639-2
* ISO 639-3
* ISO 639-4
* ISO 639-5
* ISO 639-6

* Language tags as defined by the Internet Engineering Task Force (IETF)
* BCP 47: [Best Current Practice 47](https://tools.ietf.org/html/bcp47), which includes RFC 5646
* RFC 5646, which superseded RFC 4646, which superseded [RFC 3066](https://www.ietf.org/rfc/rfc3066.txt). (Therefore all standards which depend on any of these 3 IETF standards now use ISO 639-3.)

## Character Encoding 
* Unicode 
  * UTF-8
  * UTF-16

## Script Identification Standards
* ISO 15924 [More on Unicode Website](http://unicode.org/iso15924/codelists.html)
* ISO 15919 A standard for transliteration of Indic scripts to Roman scripts[More on Wikipedia](http://en.wikipedia.org/wiki/ISO_15919)

## Metadata Standards
* OLAC 1.1 [OLAC: the Open Languages Archive Community](http://www.language-archives.org/REC/language.html)
* Dublin Core Metadata Initiative: DCMI Metadata Term(http://purl.org/dc/elements/1.1/language) for language, via IETF's RFC 4646 (now superseded by RFC 5646)
* MARC library codes.
* MODS (Metadata Object Description Schema) [library codes](http://www.loc.gov/standards/mods/v3/mods-userguide-elements.html): Incorporates IETF's RFC 3066 (now superseded by RFC 5646).
* DOAP Metadata for application profile [See DCMI discussion](http://dublincore.org/groups/tools/map.shtml) [See GitHub](https://github.com/edumbill/doap) See Paper: Severiens, Thomas & Greenberg, Jane. 2007. The DCMI Tools application profile. 2007 Proc. Int’l Conf. on Dublin Core and Metadata Applications. http://dcpapers.dublincore.org/pubs/article/view/874

## i18n / Locale data
* Unicode's CLDR (Common locale data repository): Uses several hundred codes from ISO 639-3 not included in ISO 639-2.

## Text Markup Formats

### Documents
* [HTML5](http://www.w3.org/TR/html5/dom.html#the-lang-and-xml:lang-attributes): via IETF's BCP 47.
* Text Encoding Initiative [TEI](http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-language.html) via IETF's BCP 47.

### Corpora
* [NLP Annotation Format (NAF)](http://www.newsreader-project.eu/files/2013/01/techreport.pdf), formerly known as [KAF (Knowledge/Kyoto Annotation Format)](http://weblab.iit.cnr.it/kyoto/xmlgroup.iit.cnr.it/kyoto/indexdd46.html)
* [NLP Interchange Format (NIF)](http://persistence.uni-leipzig.org/nlp2rdf/)

### Lexicons
* Lexical Markup Framework: ISO specification for representation of machine-readable dictionaries.
